21 research outputs found

    Recognizing point clouds using conditional random fields

    Get PDF
    Detecting objects in cluttered scenes is a necessary step for many robotic tasks and facilitates the interaction of the robot with its environment. Because of the availability of efficient 3D sensing devices as the Kinect, methods for the recognition of objects in 3D point clouds have gained importance during the last years. In this paper, we propose a new supervised learning approach for the recognition of objects from 3D point clouds using Conditional Random Fields, a type of discriminative, undirected probabilistic graphical model. The various features and contextual relations of the objects are described by the potential functions in the graph. Our method allows for learning and inference from unorganized point clouds of arbitrary sizes and shows significant benefit in terms of computational speed during prediction when compared to a state-of-the-art approach based on constrained optimization.Peer ReviewedPostprint (author’s final draft

    Action recognition based on efficient deep feature learning in the spatio-temporal domain

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.Peer ReviewedPostprint (author's final draft

    Perceiving dynamic environments : from surface geometry to semantic representation

    Get PDF
    Perceiving human environments is becoming increasingly fundamental with the gradual adaptation of robots for domestic use. High-level tasks such as the recognition of objects and actions need to be performed far the active engagement of the robot with its surroundings. Nowadays, the environment is primarily captured using visual information in the form of color and depth images. Visual cues obtained from these images serve as a base upan which perception-related applications are developed. Far example, using appearance models far detecting objects and extracting motion infarmation far recognizing actions. However, given the complex variations of naturally occurring scenes, extracting a set of robust visual cues becomes harder here than in other contexts. In this thesis, we develop a hierarchy of tools to improve the different aspects of robot perception in human-centered, possibly dynamic, environments. We start with the segmentation of single images and extend it to videos. Afterwards, we develop a surface tracking approach along with the incorporation of our video segmentation method. We then investigate the higher-level tasks of semantic segmentation and recognition. Finally, we focus on recognizing actions in videos. The introduction of Kinectstyle depth sensors is relatively new and its usage in the field of robotics cannot be found befare half a decade ago. Such sensors enable the acquisition of high-resolution color and depth images at a low cost. Given this opportunity, we dedícate a bulk of our work to the exploitation of the depth infarmation obtained using such sensors, thereby pushing forward the state-of-the-art in perception problems. The thesis is conceptually grouped into two parts. In the first part, we address the low-level tasks of segmentation and tracking with depth images. In many cases, depth data gives a better disambiguation of surface boundaries of different objects in a scene when compared to their color counterpart. We exploit this information in a novel depth segmentation scheme that fits quadratic surface models on different surfaces in a competing fashion . We further extend the method to the video domain by initializing the segmentation results and surface model parameters from the previous trame for the next trame. In this way, we successfully create a video segmentation algorithm, in which the segment label belonging to each surface becomes coherent over time. We also devise a particle-filter-based tracker that uses depth data to track a surface. The tracker is made more robust by combining it with our video segmentation approach. The segmentation results serve as a useful prior for high-level tasks. In the second part we deal with such tasks which include (i) object recognition, (ii) pixelwise object class segmentation, and (iii) action recognition . We propase (i) to address object recognition by creating context-aware conditional random field models. We show the importance of the context in object recognition by modeling geometrical relations between different objects in a scene. We perform (ii) object class segmentation using a convolutional neural network. We introduce a novel distance-from-wall feature and demonstrate its effectiveness in generating better class proposals for objects that are clase to the walls. The final part of the thesis deals with (iii) action recognition. We propase a 2D convolutional neural network extended to a concatenated 3D network that learns to extrae! features from the spatio-temporal domain of raw video data. The network is trained to predict an action label for each video. In summary, several perception aspects are addressed with the utilization of depth infarmation where available. Our main contributions are (a) the introduction of a depth video segmentation scheme, (b) a graphical model far object recognition, and our proposals of the deep learning models for (e) object class segmentation and (d) action recognition.Los sistemas de percepción en entornos humanos son cada vez más importantes para la adaptación gradual de los robots a tareas domésticas. Tareas de alto nivel, tales como el reconocimiento de objetos y acciones, son necesarias para conseguir la participación activa del robot en dichas tareas. Hoy en día el entorno del robot es capturado principalmente usando información visual en forma de imágenes de color y profundidad. Las características visuales obtenidas a partir de estas imágenes sirven como base para el desarrollo de aplicaciones relacionadas con la percepción del robot. Por ejemplo, el uso de modelos de apariencia para la detección de objetos y la extracción de información del movimiento para el reconocimiento de acciones. Sin embargo, dado que las escenas pueden contener variaciones complejas, la extracción de un conjunto de características visuales puede convertirse en una tarea muy difícil. En la presente tesis hemos desarrollado una jerarquía de herramientas para mejorar diferentes aspectos de la percepción del robot en entornos humanos, posiblemente dinámicos. Esta tesis comienza con la segmentación de imágenes individuales, que luego se extiende a vídeos. Posteriormente, diseñamos un enfoque de seguimiento de superficies que incorpora nuestro método de segmentación de vídeos. A continuación, investigamos tareas de alto nivel para la segmentación semántica y el reconocimiento. Finalmente, nos centramos en el reconocimiento de acciones en vídeos. La introducción de sensores de profundidad tipo Kinect es relativamente nueva y su uso en el campo de la robótica empezó hace tan solo media década. Tales sensores permiten la adquisición de color y profundidad de imágenes de alta resolución a bajo coste. Dada esta oportunidad, dedicamos una buena parte de nuestro trabajo a la explotación de la información de profundidad obtenida a través de dichos sensores, mejorando el estado del arte en problemas de percepción. La tesis está conceptualmente dividida en dos partes. En primer lugar, abordamos las tareas de bajo nivel de segmentación y seguimiento con imágenes de profundidad. En muchos casos, los datos de profundidad permite una mejor desambiguación de los límites de las superficies de diferentes objetos de una escena en comparación con los datos de color. Explotamos esta información en un nuevo esquema de segmentación de profundidad que ajusta modelos cuadráticos de superficies de forma competitiva. Extendemos el método a vídeos de modo que la etiquetación de superficies resulte coherente en el tiempo. También proponemos un rastreador basado en un filtro de partículas que utiliza los datos de profundidad para realizar el seguimiento de una superficie. El seguimiento se hace más robusto al combinarlo con nuestro enfoque de segmentación en vídeo. Los resultados de la segmentación son usados como información a priori para tareas de alto nivel. En la segunda parte nos ocupamos de este tipo de tareas que incluyen el (i) reconocimiento de objetos, (ii) la segmentación de clases de objetos a nivel de píxeles, y (iii) el reconocimiento de acciones. Proponemos (i) abordar el reconocimiento de objetos mediante la creación de modelos de campos aleatorios condicionales sensibles al contexto. Realizamos (ii) la segmentación de la clase del objeto utilizando una red neuronal de convolución. Se introduce una nueva característica de distancia-a-paredes y demostramos su eficacia en la mejora de la clasificación de objetos que están cerca de las paredes. La parte final de la tesis se ocupa del (iii) reconocimiento de acciones. Proponemos una red neuronal de convolución 2D extendida a una red 3D concatenada, que aprende a extraer las características del dominio espacio-temporal de los datos de vídeo. La red está capacitada para predecir la etiqueta de acción para cada vídeo

    Joint segmentation and tracking of object surfaces in depth movies along human/robot manipulations

    Get PDF
    A novel framework for joint segmentation and tracking in depth videos of object surfaces is presented. Initially, the 3D colored point cloud obtained using the Kinect camera is used to segment the scene into surface patches, defined by quadratic functions. The computed segments together with their functional descriptions are then used to partition the depth image of the subsequent frame in a consistent manner with respect to the precedent frame. This way, solutions established in previous frames can be reused which improves the efficiency of the algorithm and the coherency of the segmentations along the movie. The algorithm is tested for scenes showing human and robot manipulations of objects. We demonstrate that the method can successfully segment and track the human/robot arm and object surfaces along the manipulations. The performance is evaluated quantitatively by measuring the temporal coherency of the segmentations and the segmentation covering using ground truth. The method provides a visual front-end designed for robotic applications, and can potentially be used in the context of manipulation recognition, visual servoing, and robot-grasping tasksPeer ReviewedPostprint (author’s final draft

    Realtime tracking and grasping of a moving object from range video

    Get PDF
    In this paper we present an automated system that is able to track and grasp a moving object within the workspace of a manipulator using range images acquired with a Microsoft Kinect sensor. Realtime tracking is achieved by a geometric particle filter on the affine group. Based on the tracked output, the pose of a 7-DoF WAM robotic arm is continuously updated using dynamic motor primitives until a distance measure between the tracked object and the gripper mounted on the arm is below a threshold. Then, it closes its three fingers and grasps the object. The tracker works in real-time and is robust to noise and partial occlusions. Using only the depth data makes our tracker independent of texture which is one of the key design goals in our approach. An experimental evaluation is provided along with a comparison of the proposed tracker with state-of-the-art approaches, including the OpenNI-tracker. The developed system is integrated with ROS and made available as part of IRI's ROS stack.Peer ReviewedPostprint (author’s final draft

    Semantic segmentation priors for object discovery

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reliable object discovery in realistic indoor scenes is a necessity for many computer vision and service robot applications. In these scenes, semantic segmentation methods have made huge advances in recent years. Such methods can provide useful prior information for object discovery by removing false positives and by delineating object boundaries. We propose a novel method that combines bottom-up object discovery and semantic priors for producing generic object candidates in RGB-D images. We use a deep learning method for semantic segmentation to classify colour and depth superpixels into meaningful categories. Separately for each category, we use saliency to estimate the location and scale of objects, and superpixels to find their precise boundaries. Finally, object candidates of all categories are combined and ranked. We evaluate our approach on the NYU Depth V2 dataset and show that we outperform other state-of-the-art object discovery methods in terms of recall.Peer ReviewedPostprint (author's final draft

    Global age-sex-specific mortality, life expectancy, and population estimates in 204 countries and territories and 811 subnational locations, 1950–2021, and the impact of the COVID-19 pandemic: a comprehensive demographic analysis for the Global Burden of Disease Study 2021

    Get PDF
    Background: Estimates of demographic metrics are crucial to assess levels and trends of population health outcomes. The profound impact of the COVID-19 pandemic on populations worldwide has underscored the need for timely estimates to understand this unprecedented event within the context of long-term population health trends. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 provides new demographic estimates for 204 countries and territories and 811 additional subnational locations from 1950 to 2021, with a particular emphasis on changes in mortality and life expectancy that occurred during the 2020–21 COVID-19 pandemic period. Methods: 22 223 data sources from vital registration, sample registration, surveys, censuses, and other sources were used to estimate mortality, with a subset of these sources used exclusively to estimate excess mortality due to the COVID-19 pandemic. 2026 data sources were used for population estimation. Additional sources were used to estimate migration; the effects of the HIV epidemic; and demographic discontinuities due to conflicts, famines, natural disasters, and pandemics, which are used as inputs for estimating mortality and population. Spatiotemporal Gaussian process regression (ST-GPR) was used to generate under-5 mortality rates, which synthesised 30 763 location-years of vital registration and sample registration data, 1365 surveys and censuses, and 80 other sources. ST-GPR was also used to estimate adult mortality (between ages 15 and 59 years) based on information from 31 642 location-years of vital registration and sample registration data, 355 surveys and censuses, and 24 other sources. Estimates of child and adult mortality rates were then used to generate life tables with a relational model life table system. For countries with large HIV epidemics, life tables were adjusted using independent estimates of HIV-specific mortality generated via an epidemiological analysis of HIV prevalence surveys, antenatal clinic serosurveillance, and other data sources. Excess mortality due to the COVID-19 pandemic in 2020 and 2021 was determined by subtracting observed all-cause mortality (adjusted for late registration and mortality anomalies) from the mortality expected in the absence of the pandemic. Expected mortality was calculated based on historical trends using an ensemble of models. In location-years where all-cause mortality data were unavailable, we estimated excess mortality rates using a regression model with covariates pertaining to the pandemic. Population size was computed using a Bayesian hierarchical cohort component model. Life expectancy was calculated using age-specific mortality rates and standard demographic methods. Uncertainty intervals (UIs) were calculated for every metric using the 25th and 975th ordered values from a 1000-draw posterior distribution. Findings: Global all-cause mortality followed two distinct patterns over the study period: age-standardised mortality rates declined between 1950 and 2019 (a 62·8% [95% UI 60·5–65·1] decline), and increased during the COVID-19 pandemic period (2020–21; 5·1% [0·9–9·6] increase). In contrast with the overall reverse in mortality trends during the pandemic period, child mortality continued to decline, with 4·66 million (3·98–5·50) global deaths in children younger than 5 years in 2021 compared with 5·21 million (4·50–6·01) in 2019. An estimated 131 million (126–137) people died globally from all causes in 2020 and 2021 combined, of which 15·9 million (14·7–17·2) were due to the COVID-19 pandemic (measured by excess mortality, which includes deaths directly due to SARS-CoV-2 infection and those indirectly due to other social, economic, or behavioural changes associated with the pandemic). Excess mortality rates exceeded 150 deaths per 100 000 population during at least one year of the pandemic in 80 countries and territories, whereas 20 nations had a negative excess mortality rate in 2020 or 2021, indicating that all-cause mortality in these countries was lower during the pandemic than expected based on historical trends. Between 1950 and 2021, global life expectancy at birth increased by 22·7 years (20·8–24·8), from 49·0 years (46·7–51·3) to 71·7 years (70·9–72·5). Global life expectancy at birth declined by 1·6 years (1·0–2·2) between 2019 and 2021, reversing historical trends. An increase in life expectancy was only observed in 32 (15·7%) of 204 countries and territories between 2019 and 2021. The global population reached 7·89 billion (7·67–8·13) people in 2021, by which time 56 of 204 countries and territories had peaked and subsequently populations have declined. The largest proportion of population growth between 2020 and 2021 was in sub-Saharan Africa (39·5% [28·4–52·7]) and south Asia (26·3% [9·0–44·7]). From 2000 to 2021, the ratio of the population aged 65 years and older to the population aged younger than 15 years increased in 188 (92·2%) of 204 nations. Interpretation: Global adult mortality rates markedly increased during the COVID-19 pandemic in 2020 and 2021, reversing past decreasing trends, while child mortality rates continued to decline, albeit more slowly than in earlier years. Although COVID-19 had a substantial impact on many demographic indicators during the first 2 years of the pandemic, overall global health progress over the 72 years evaluated has been profound, with considerable improvements in mortality and life expectancy. Additionally, we observed a deceleration of global population growth since 2017, despite steady or increasing growth in lower-income countries, combined with a continued global shift of population age structures towards older ages. These demographic changes will likely present future challenges to health systems, economies, and societies. The comprehensive demographic estimates reported here will enable researchers, policy makers, health practitioners, and other key stakeholders to better understand and address the profound changes that have occurred in the global health landscape following the first 2 years of the COVID-19 pandemic, and longer-term trends beyond the pandemic

    Utvärdering av metoder för 3D-miljö återuppbyggnad med avseende på navigation och Manipulation Uppgifter för mobila robotar

    No full text
    The field of 3-D-environment reconstruction has been subject to various research activities in recent years. The applications for mobile robots are manifold. First, for navigation tasks (especially SLAM), the perception of 3-D-obstacles has many advantages over navigation in 2-D-maps, as it is commonly done. Objects that are located hanging above the ground can be recognized and furthermore, the robots gain a lot more information about its operation area what makes localization easier. Second, in the field of tele-operation of robots, a visualization of the environment in three dimensions helps the tele-operator performing tasks. Therefore, a consistent, dynamically updated environment model is crucial. Third, for mobile manipulation in a dynamic environment, an on-line obstacle detection and collision avoidance can be realized, if the environment is known. In recent research activities, various approaches to 3-D-environment reconstruction have evolved. Two of the most promising methods are FastSLAM and 6-D-SLAM. Both are capable of building dense 3D environment maps on-line. The first one uses a Particle Filter applied on extracted features in combination with a robot system model and a measurement model to reconstruct a map. The second one works on 3-D point cloud data and reconstructs an environment using the ICP algorithm. Both of these methods are implemented in GNU C++. Firstly, FastSLAM is implemented. The object-oriented programming technique is used to build up the Particle and Extended Kalman Filters. Secondly, 6-D SLAM is implemented. The concept of inheritance in C++ is used to make the implementation of ICP algorithm as much generic as possible. To test our implementation a mobile robot called Care-O-bot 3 is used. The mobile robot is equipped with a color and a time-of-fight camera. Data sets are taken as the robot moves in different environments and our implementation of FastSLAM and 6-D SLAM is used to reconstruct the maps.Fältet av 3-D-miljö återuppbyggnaden har varit föremål för olika forskningsinsatser under senare år. De ansökningar om mobila robotar är många. Först för navigering uppgifter (särskilt Slam), uppfattningen av 3-D-hinder har många fördelar jämfört navigering i 2-D-kartor, som det vanligtvis görs. Objekt som finns hänger över marken kan erkännas och dessutom robotarna vinna mycket mer information om dess verksamhet området vad som gör lokalisering lättare. För det andra, när det gäller tele-drift av robotar, hjälper en visualisering av miljön i tre dimensioner tele-aktör som utför uppgifter. Därför är en konsekvent, dynamiskt uppdaterade miljö modell avgörande. Tredje kan för mobila manipulation i en dynamisk miljö, en on-line hinder upptäcka och undvika kollisioner förverkligas, om miljön är känd. Under senare forskning, till 3-D-miljö återuppbyggnaden olika strategier har utvecklats. Två av de mest lovande metoderna är FastSLAM och 6-D-Slam. Båda kan bygga täta 3D-miljö kartor on-line. Det första man använder ett partikelfilter som tillämpas på extraherade funktioner i kombination med ett robotsystem modell och en mätning modell för att rekonstruera en karta. Den andra verk på 3-D data punktmoln och rekonstruerar en miljö med hjälp av ICP algoritm. Båda dessa metoder implementeras i GNU C. För det första är FastSLAM genomföras. Det objektorienterade programmering Tekniken används för att bygga upp Partikel-och Extended Kalman filter. För det andra är 6-D SLAM genomförs. Begreppet arv i C används för att göra genomförandet av ICP algoritm så mycket generisk som möjligt. För att testa vårt genomförande en mobil robot som heter Care-O-bot 3 används. Den mobila roboten är utrustad med en färg och en time-of-kamp kamera. Dataset tas som roboten rör sig i olika miljöer och vårt genomförande av FastSLAM och 6-D SLAM används för att rekonstruera [email protected] Contact in Sweden 0046-762-40991

    Joint segmentation and tracking of object surfaces in depth movies along human/robot manipulations

    No full text
    A novel framework for joint segmentation and tracking in depth videos of object surfaces is presented. Initially, the 3D colored point cloud obtained using the Kinect camera is used to segment the scene into surface patches, defined by quadratic functions. The computed segments together with their functional descriptions are then used to partition the depth image of the subsequent frame in a consistent manner with respect to the precedent frame. This way, solutions established in previous frames can be reused which improves the efficiency of the algorithm and the coherency of the segmentations along the movie. The algorithm is tested for scenes showing human and robot manipulations of objects. We demonstrate that the method can successfully segment and track the human/robot arm and object surfaces along the manipulations. The performance is evaluated quantitatively by measuring the temporal coherency of the segmentations and the segmentation covering using ground truth. The method provides a visual front-end designed for robotic applications, and can potentially be used in the context of manipulation recognition, visual servoing, and robot-grasping tasksPeer Reviewe
    corecore